Abstract: Feature selection is an important for high dimensional dataset. The best subset contains the least number of dimensions that highly contributes to the accuracy and so the remaining unimportant dimensions are ignored. Selecting relevant features from unlabelled data is a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of IT log further complicates the challenging problem of unsupervised feature selection, (e.g., part of IT log data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper we compare the performance of Linked Unsupervised feature selection algorithm [1]and feature selection using feature similarity [2].We perform experiments with IT log dataset to evaluate the effectiveness of the both the frameworks.
Keywords: IT log analysis, high dimensional, unlabelled data, attribute-value.